国产bbaaaaa片,成年美女黄网站色视频免费,成年黄大片,а天堂中文最新一区二区三区,成人精品视频一区二区三区尤物

首頁(yè)> 外文OA文獻(xiàn) >A hardware evaluation of cache partitioning to improve utilization and energy-efficiency while preserving responsiveness
【2h】

A hardware evaluation of cache partitioning to improve utilization and energy-efficiency while preserving responsiveness

機(jī)譯:緩存分區(qū)的硬件評(píng)估,以提高利用率和能效,同時(shí)保持響應(yīng)能力

代理獲取
本網(wǎng)站僅為用戶提供外文OA文獻(xiàn)查詢和代理獲取服務(wù),本網(wǎng)站沒(méi)有原文。下單后我們將采用程序或人工為您竭誠(chéng)獲取高質(zhì)量的原文,但由于OA文獻(xiàn)來(lái)源多樣且變更頻繁,仍可能出現(xiàn)獲取不到、文獻(xiàn)不完整或與標(biāo)題不符等情況,如果獲取不到我們將提供退款服務(wù)。請(qǐng)知悉。

摘要

Computing workloads often contain a mix of interactive, latency-sensitive foreground applications and recurring background computations. To guarantee responsiveness, interactive and batch applications are often run on disjoint sets of resources, but this incurs additional energy, power, and capital costs. In this paper, we evaluate the potential of hardware cache partitioning mechanisms and policies to improve efficiency by allowing background applications to run simultaneously with interactive foreground applications, while avoiding degradation in interactive responsiveness. We evaluate these tradeoffs using commercial x86 multicore hardware that supports cache partitioning, and find that real hardware measurements with full applications provide different observations than past simulation-based evaluations. Co-scheduling applications without LLC partitioning leads to a 10% energy improvement and average throughput improvement of 54% compared to running tasks separately, but can result in foreground performance degradation of up to 34% with an average of 6%. With optimal static LLC partitioning, the average energy improvement increases to 12% and the average throughput improvement to 60%, while the worst case slowdown is reduced noticeably to 7% with an average slowdown of only 2%. We also evaluate a practical low-overhead dynamic algorithm to control partition sizes, and are able to realize the potential performance guarantees of the optimal static approach, while increasing background throughput by an additional 19%.
機(jī)譯:計(jì)算工作負(fù)載通常包含交互式,對(duì)延遲敏感的前臺(tái)應(yīng)用程序和循環(huán)后臺(tái)計(jì)算。為了保證響應(yīng)速度,交互式和批處理應(yīng)用程序通常在不相交的資源集上運(yùn)行,但這會(huì)產(chǎn)生額外的能源,電力和資本成本。在本文中,我們?cè)u(píng)估了硬件緩存分區(qū)機(jī)制和策略通過(guò)允許后臺(tái)應(yīng)用程序與交互式前臺(tái)應(yīng)用程序同時(shí)運(yùn)行,同時(shí)避免交互響應(yīng)性能下降而提高效率的潛力。我們使用支持高速緩存分區(qū)的商用x86多核硬件評(píng)估了這些折衷方案,發(fā)現(xiàn)與過(guò)去基于仿真的評(píng)估相比,具有完整應(yīng)用程序的實(shí)際硬件測(cè)量提供了不同的觀察結(jié)果。與單獨(dú)運(yùn)行的任務(wù)相比,沒(méi)有LLC分區(qū)的協(xié)同調(diào)度應(yīng)用程序可將能耗提高10%,將平均吞吐量提高54%,但可導(dǎo)致前臺(tái)性能下降多達(dá)34%,平均下降6%。使用最佳的靜態(tài)LLC分區(qū),平均能耗提高到12%,平均吞吐量提高到60%,而最壞情況下的速度降低到7%,平均速度僅降低2%。我們還評(píng)估了一種實(shí)用的低開(kāi)銷動(dòng)態(tài)算法來(lái)控制分區(qū)大小,并能夠?qū)崿F(xiàn)最佳靜態(tài)方法的潛在性能保證,同時(shí)將背景吞吐量提高了19%。

著錄項(xiàng)

相似文獻(xiàn)

  • 外文文獻(xiàn)
  • 中文文獻(xiàn)
  • 專利
代理獲取

客服郵箱:kefu@zhangqiaokeyan.com

京公網(wǎng)安備:11010802029741號(hào) ICP備案號(hào):京ICP備15016152號(hào)-6 六維聯(lián)合信息科技 (北京) 有限公司?版權(quán)所有
  • 客服微信

  • 服務(wù)號(hào)